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Chapter  1.  OVERVIEW 


Software  maintenance  of  computer  systems  is  an  essential 
task.  Maintenance  is  a  difficult  and  expensive  activity;  more 
time  and  money  are  spent  in  software  maintenance  than  in  software 
development.  Current  demands  require  the  development  of  good 
tools  for  evaluating  software  during  maintenance.  The 
maintenance  tasks  would  be  simplified  by  knowing  which  modules 
are  most  susceptible  to  change  and  which  ones  should  be 
rewritten. 

Literature  Survey 

The  maintenance  process  has  not  been  sufficiently  well 
explored.  Some  methods  exist  for  predicting  development 
characteristics  of  the  maintenance  process.  No  single  technique 
can  hope  to  solve  the  maintenance  problem  which  will  remain  a 
challenge  to  produce  greater  flexibility  and  longer  life- 
cycles[PARR:Pa79] .  By  given  explicit  attention  to  characteristics 
of  both  software  quality  and  requirements  for  long-term 
maintenance  we  can  produce  significant  savings  in  software 
lifecycle  costsCBROWN:Bo76  and  GILB:Gi79].  Application  of 
improved  development  techniques  has  emphasized  the  need  for 
improved  techniques  for  requirements  analysis  and  specification. 
The  use  of  these  techniques  and  their  relationship  to  each  other 
are  not  often  clear[FREEMAN:Fr 79] ,  and  the  need  for  continuing 
maintenance  and  change  of  software  is  not  primarily  due  to  a  lack 
of     foresight  or  to  poor  planning[LEHMAN:Le79] .     An  understanding 
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of  the  maintenance  process  might  be  based  on  working  with  real 
software  by  improving  the  efficiency  of  the  maintenance  [PARR: 
Pa79].  The  quantitative  approach  provokes  us  to  ask  better 
questions     about  the  known  effects  of  the     various  technological 

» 

alternatives[GILB:Gi79] . 

There  are  two  different  approaches  for  assessing  software 
maintainability.  One  is  based  on  the  extent  that  program 
difficulty  represents  the  sum  of  the  difficulties  of  its 
constituent  elements  of  sof tware[BERNS :Be84  and  HALSTEAD:E176] . 
The  other  is  based  on  a  quantitative  evaluation  of  software 
quality  by  collecting  experience  data  in  a  form  suited  to  our 
future  and  common  needs[GILB:Gi79  and  BOEHM:Bo76].  In  the  former 
case,  the  elements  of  software  are  quantified  by  attributes  and 
interrelationships  for  checking  the  program  difficulty  or 
understandability  rather  than  usability,  reliability,  and 
modif lability.  In  the  latter  case,  the  collection  of  empirical 
data  from  ongoing  maintenance  processes  in  order  to  measure  the 
nature  of  the  software  is  needed.  The  nature  of  maintenance  work 
suggests  that  empirical  analyses  are  the  most  appropriate  in 
leading  us  to  a  greater  understanding  of  the  structure  of  large 
software  systems  [PARR:Pa79  and  HENDERSON:  He79].  These  analyses 
may  form  one  of  the  formal  methodologies  for  the  development  of 
quality  software[A  B  Marmor -Squires :Ma79] . 

This  research  is  based  on  the  development  of  a  maintenance 
measure  which  was  specified  in  the  software  measures  research  by 
Dr.  Gustafson.  The  research  has  two  fields  of  study.  One  is  to 
develop  a  maintenance  theory  of  changes  and  derive  the  method  of 
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predicting  changes  from  the  theory.  The  other  is  to  develop  a 
maintenance  measure  that  depends  upon  the  empirical  data  of 
software  changes.  The  empirical  data  will  be  obtained  from  the 
source  code.  This  study  uses  the  latter  approach.  The  maintenance 
measure  of  this  study  could  help  the  maintenance  tasks. 

We  conducted  an  experiment  to  investigate  changes  between 
Systems  and  Systems  of  Unix.  The  Systems  is  the  older  version, 
and  the  Systems  is  the  newer  version  which  was  created  from 
Systems.  The  experiment  was  to  analyze  the  differences  between 
the  two  versions.  All  the  C  modules  were  processed  by  our 
analysis  programs.  The  differences  were  studied  as  changes  to 
the  older  version.  The  changes  were  analyzed  using  several 
statistical  packages  to  find  relationships  among  the  changes.  The 
results  support  the  development  of  a  maintenance  tool  that  could 
be  used  to  predict  the  modules  most  likely  to  be  changed.  The 
ability  to  predict  where  changes  will  occur  during  maintenance 
and  enhancement  could  minimize  the  extent  of  changes  and  reduce 
the  maintenance  cost. 

This  thesis  includes  an  explanation  of  the  data  collection 
and  analysis,  discussion  of  results,  an  interpretation  of 
results,  a  statement  of  conclusions,  and  suggestions  for  future 
extensions . 
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Chapter  2.   DATA  COLLECTION  AND  ANALYSIS 


The  first  step  of  our  research  was  to  analyze  the  relation 
of  changes  between  the  C  modules  of  Unix  Systems  and  Systems. 
The  analysis  performed  in  this  research  was  aimed  at  better 
understanding  maintenance  and  at  developing  predictive  methods 
for  the  maintenance  process.  The  sequence  of  the  analysis  was  as 
followings;  First,  definitions  about  changes  were  developed. 
Second,  123  pairs  were  chosen,  consisting  of  35,464  lines  of  code 
in  System3  and  46,023  lines  of  code  in  Systems.  Third,  programs 
were  written  to  collect  all  the  information.  Forth,  the  empirical 
data  were  analyzed  by  using  several  statistical  packages. 

2 . 1     Developing  Definitions 

Some  specialized  definitions  were  developed  for  the  analysis 
according  to  the  software  measures  research.  The  definitions  for 
the  possible  changes  of  the  modules  were  developed  and  evaluated. 
The     chosen  definitions  are  given  below: 

1)  changes[type]   :     number  of  statements     of  specified  type  that 

have  been  changed. 

2)  change  percent [type]   :  percentage  of  type  that  has  changed. 

change  percent  =   (changes [type]  /  total  number)   *  100 

3)  average  nesting  level   :   the  average  level  of  nesting  for 

the  statements  in  a  module. 

average  nesting  level  =  (SUM  {i  =0  to  n}  of  i*nli)   /  LOC 

nli   :   the  number  of  statements  at  nesting  level  i 
LOC  :   lines  of  code 
SUM  :  summation 

4)  weight[type]     :     the     number  of     statement     changes     for  each 

statement  type  for  initial  study[Gu8S]. 
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weight  =  SUM  {i  =0  to  k}  of   (weight [type]  *  xi ) 

xi   :   the  number  of  occurrence  of  statement  type 
5)  weight  /  LOG   :   the  number  of  statement  changes  per  line. 

Those  terms  were  used  to  assess  the  modules  for  our 
empirical  data  in  terms  of  the  lines  of  code ( LOG ) ,  weight,  weight 
per  line,  and  average  nesting  level.  The  lines  of  code  shows  how 
many  lines  a  module  has  or  how  big  it  is.  The  weights  represent 
the  change  percentages  for  the  program  statements  in  each  module. 
The  weight[type]  of  each  statement  type  was  measured  in  the 
original  research  paper[Gu85]  for  seeing  what  statement  types  are 
most  likely  to  be  changed.  The  weight  is  quantified  by  the  number 
of  each  statement  based  upon  the  change  percent.  So  the  weight 
was  used  as  a  possible  measure  for  predicting  further  changes. 
The  average  level  of  nesting  for  the  statements  in  a  module  is 
determined  by  the  indented  tabs  of  each  line.  The  average  nesting 
level  represents  the  indented  levels  per  line  for  checking  how 
much  a  module  is  nested. 

2 . 2     Ghoosing  C  Modules 

All  the  G  modules  between  System3  and  System5  which  had  the 
same  names  in  both  directories  were  processed.  Other  modules  were 
considered  as  improvements  or  changes  of  the  system  capabilities 
and  not  as  merely  maintenance.  The  total  number  of  G  modules  in 
each  system  was  140.  123  modules  were  chosen  among  those  for  our 
empirical  study.  The  other  seventeen  modules  in  Systems  did  not 
have  counterparts  in  Systems. 
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2 . 3  Writing  Programs 

For  the  analysis,  several  programs  were  written  as  shell 
programs  in  Unix.  The  programs  were  designed  to  collect  the 
information  for  the  different  stages  of  each  module;  First,  the 
indentation  program  [Appendix  A.l]  counts  tabs  of  each  line  and 
calculates  the  average  tabs  which  is  defined  as  an  average 
nesting  level.  Each  line  is  classified  into  level  zero  to  six 
based  on  the  number  of  tabs.  Second,  the  program[ Appendix  A. 2] 
for  nesting  levels  searches  all  the  program  reserved  words  and 
gathers  the  word  counts  for  each  module.  The  other  part  of  the 
program[ Appendix  A. 3]  quantifies  the  weight  according  to  the 
counts.  The  main  program  generates  the  lines  of  code,  weight,  and 
weight  per  line.  A  processing  time  of  five  hours  was  needed  to 
execute  the  two  whole  directories  of  Unix  with  these  programs. 

2.4  Analyzing  Data  by  Using  the  Statistic  Packages 

Three  statistical  packages  were  used  to  analyze  our 
empirical  data  which  obtained  from  the  C  modules  between  System3 
and  Systems.  We  used  three  steps  to  start  the  analysis  of  the 
data.  The  processing  steps  are  as  followings: 

First  step;  The  possible  relations  of  our  specialized  terms 
were  expanded  and  all  the  values  were  calculated  by 
the  shell  programs.  18  variables  and  123  cases  were 
created  for  the  next  processing.  The  empirical  data 
were  manipulated  by  the  Excel  system  which  is  an 
advanced  worksheet  package  for  the  Macintosh. 

Second  step;  We  used  the  Macspin  for  finding  what  sort  of 
relations  exist.     Macspin  is  a  statistical  analysis 
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tool  which  is  designed  for  high  performance 
interaction  with  multivariate  data.  We  checked  all 
the  relations  with  our  statistical  data  by  using 
graphical  displays.  We  could  find  some  relations 
and  estimate  the  patterns  through  the  three 
dimensional  scatterplots  of  the  data.  The  second 
processing  step  gave  us  the  possible  relationships 
that  were  quantified  in  step  three. 
Third  step;  The  18  variables  and  123  cases (modules )  were 
analyzed  by  Statfast  which  is  a  general 
statistical  package.  The  package  performed  the 
statistical  procedures  such  as  t-test,  student  F- 
test,  correlations,  and  multiple  regression.  The 
multiple  regression  was  used  to  perform  relative 
analyses  on  the  data.  This  analysis  allowed  us  to 
see  the  means,  standard  deviations,  and  the 
correlation  matrix.  The  matrix  of  correlations  were 
displayed  as  a  table  with  18  by  18  variables.  We 
could  observe  the  minimum  relationships  with  the 
table.  Performing  the  stepwise  regression  analyses, 
F-test  for  one  dependent  variable  and  t-test  for 
several  independent  variables  were  evaluated  based 
on  the  hypothesis  tests  to  determine  whether  or  not 
we  can  be  reasonably  confident  that  variables  are 
related.  The  multiple  regression  analysis  was  used 
to  know  which  variables  will  be  strong  predictors 
among     several  independent  variables  by  the  tests. 
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We  obtained  valuable  results  by  repeating  this  step 
for  different  dependent  variables.  All  the  results 
were  processed  through  the  three  steps. 

Footnotes 

*  Excel,     a  registered  trademark  of  Microsoft  Corporation,     is  a 

spreadsheet  product  for  Macintosh  that  provides  database  and 
graphic  functions,  and  designed  for  numerical  processing 
applications. 

*  Macspin,     a  registered  trademark  of  D2  Software,     Inc.,  Austin, 

Texas,  is  a  tool  for  enable  for  looking  at  three  and  higher 
dimensional  data  and  displays  abstract  multivariate  data  in  a 
direct  way.  Its  display  can  reveal  striking  patterns  and 
relationships . 

*  Statfast,     a  registered  trademark  of  StatSoft,     Inc.,   is  a  high 

performance  statistical  package  developed  in  FORTRAN  (MacFor- 
tran,  Absoft,  Inc)  and  offers  the  speed  for  performing 
statistical  analysis  that  makes  it  fully  suitable  for 
scientific  and  business  applications. 

*  Macintosh,     a  trademark  licenced  to  Apple  Computer,     Inc.,   is  a 

32  bit  micro  computer  has  powerful  68000  CPU. 

*  UNIX  is  a  registered  trademark  of  AT&T. 
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Chapter  3.   DATA  ANALYSIS 


The  data  analysis  was  performed  by  the  three  statistical 
packages.  The  source  data[Appendix  B]  were  obtained  from  the 
source  codes  between  two  directories  of  Unix.  The  data  were 
evaluated  by  the  packages  through  the  specialized  capabilities 
such  as  numerical  processing,  graphical  analysis,  and  statistical 
analysis.  The  dependent  variables  of  Systems  were  compared  with 
the  independent  variables  of  Systems  for  the  analysis[Table  1]. 
The  relative  relations  between  a  dependent  and  independent 
variables  were  tested  by  the  multiple  regression.  The  hypothesis 
test  was  used  to  determine  whether  an  independent  variable  is 
acceptable  or  not.  For  example,  the  first  dependent  variable  in 
Table  1  is  lines  of  code  of  the  Systems  and  the  independent 
variables  consist  of  lines  of  code,  weight,  weight  per  line,  and 
average  nesting  level  of  the  systems.  The  lines  of  code  in 
Systems  was  highly  correlated  with  the  lines  of  code(99.99%)  and 
weight (88.6%)  of  System3.  The  positive  relationship  for  lines  of 
code  and  the  negative  relationship  for  the  weight  suggest  that  we 
can  predict  both  relationships.  But  the  low  percentage  of 
significant  levels  of  the  weight  per  line(28.4%)  and  average 
nesting  level (57. 4%)  imply  that  we  can  not  predict  the 
relationships  because  of  the  lack  of  significances.  The 
explanatory  evaluations  for  each  variable  are  given  in  the  result 
section. 
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Table  1     The  Data  Analysis  Table 


1  Dependent 

Variables  | 

Independent 

Variables 

1           (123  modules  of  each)  | 

LOG  3  1 

WE  3 

IWE/L0C3 

Av.nst3 

Relationships  | 

pos  1 

neg 

1  pos 

neg 

1   confidence:  | 
1         99.99%  1 

1  correlation:  I 

1            U •  /  44  /  1 

Signi f i cance ( % ) | 

99.99  1 

88 . 6 

1    28 . 4 

57.4 

F:   34.2  >   4.95  I 
F:   45.6  >  5.78  I 
F :    69 . 0   >   7.32  I 

99.99  1 
99.99  1 

yy . yy  i 

88 . 6 
85.  3 

y  1 . 7 

1   28 . 4 
1  12.1 

57.4 

1              WEd  I 

Relationships  i 

pos  1 

pos 

1  pos 

neg 

1   confidence:  |- 
1         99.99%  1 
1   correlation:  | 
0.6623  1 

Signi f i cance (% ) I 

95.7  1 

13.8 

23.6 

50.  7 

F:   21.5  >   4.95  I 
F:   43.5  >   7.32  I 
F:   87.1  >   11.4  1 

95.7  1 
99.99  1 

yy . yy  i 

13.8 

23.6 

50 .  7 
47.1 

1         WE/L0C5  1 

Relationships  | 

pos  1 

pos 

pos 

neg 

confidence:  |- 

Significance (% ) j 

5.9  1 

12.9 

99.99 

15.8 

99.99%  1 
correlation:  | 
0.8655  1 

F:   82.1  >   4.95  I 
F:166.5   >   7.32  I 

5.9  1 

12.9 

99.95  1 
99.95  1 

15.8 
15.3 

Av  nst5  1 

Relationships  | 

pos  1 

neg 

pos  1 

pos 

confidence:  |- 

Signif icance(%) | 

93.4  1 

82.2 

23.4  1 

99.99 

99.99%  1 
correlation:  | 
0.9770  1 

F:577.2  >   4.95  I 
F:776.0  >  5.78  I 

93.4  1 
95.9  1 

82.2 
86.4 

23.4  1 

99.99 
99.99 
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(continued) 


LOC5-LOC3 


confidence : 

90.9% 
correlation! 

0.2634 


Relationships 


Significance ( % ) 


F:  2.050 
F:  2.711 
F:  3.811 


pos 
96.2 


96.2 
97.7 
97.6 


neg 
88.6 


88.  6 
91.6 
91.7 


pos 
28.4 
28.4 


neg 
57.4 


57.4 
52.9 


(LOC5-LOC3) 
/  LOG  3 

confidence : 

36.4% 
correlation ; 

0.1503 


Relationships 
Significance (%) 


pos 
75.  7 


neg 
76.8 


F:  0.636 
F:  0.705 
F:  0.513 


75.  7 
66.2 
64.  7 


76.8 
68.1 
68.  3 


pos 
48.4 
48.4 


neg 
76.9 


76.9 
70.1 


WE5  -  WE3 


confidence : 

86.9% 
correlation; 

0.2485 


Relationships 
Significance (%) 


pos 
95.7 


neg 
88.9 


F:  1.809 
F:  2.404 
F:  3.417 


95.  7 
97.7 
97.6 


88.9 
92.8 
92.9 


pos 
23.6 
23.6 


neg 


50.  7 


50.  7 
47.1 


(WE5-WE3) 
/  WE3 

confidence : 
99.99% 

correlation ; 
0.4470 


Relationships 
Signif icance(%) 


neg 
99.4 


pos 
99.4 


F:  6.87  >  4.95 
F:   9.22  >  5.78 


99.4 
99.6 


99.4 
99.5 


neg 


99.99 


99.99 
99.99 


neg 


16.2 
16.2 


(WE/L0C5) 
-(WE/L0C3) 

confidence : 
99.99% 

correlation ; 
0.4259 


Relationships 


Signif icance(%) 


F:  6.09  >  4.95 
F:12.23  >  7.32 
F:24.64  >  11.4 


pos 
6.6 
6.6 


pos 
12.2 
12.2 


neg 
99.9 


99.9 
99.9 
99.9 


neg 


16.4 


16.4 
15  .8 


[ (WE/L0C5)- 
(WE/L0C3)  ] 
/  (WE/L0C3) 
confidence : 

99.99% 
correlation : 

0.5122 


Relationships 
Signif icance(%) 
F:   9.78  >  4.95 


neg 
99.9 
99.9 


pos 
99.8 
99.8 


neg 


99.99 
99.99 


neg 


17.2 
17.2 
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( continued) 


(Av_nst5  - 
Av~nst3 ) 

confidence : 

99.8% 
correlation ; 

0.3843 


Relationships  I 

pos  1 

neg 

1  pos 

1  neg 

Signif icance(%) I 

93.4  1 

82.2 

1  23.3 

1  99.9 

F:   4.77  >  4.50  1 
F:   6.38  >   5.78  1 
F:   8.37  >   7.32  I 

93.4  1 
95.9  1 
96.8  1 

82.2 
86.4 

1  23.3 

1  99.9 
1  99.9 
1  99.9 

Relationships  | 

pos  1 

neg 

1  neg 

1  neg 

Signif icance(%) | 

6.3  1 

15.9 

1  30.2 

1  94.3 

F:   1.490  ~  1 
F:   2.798  ~  j 
F:   5.087  ~  | 

6.3  1 

15.9 

1  30.2 
1  52.5 

1  94.3 
1  95.1 
1  97.6 

Relationships  | 

neg  | 

pos 

1  neg 

1  pos 

Signif icance (% ) | 

25. 7  1 

5.7 

1  57.8 

1  80.2 

F:   1.192  ~  1 
F:   1.881  ~  1 
F:   2.379  ~  I 

25.  7  1 
75.8  1 

5.7 

1  57.8 

1  80.2 
1  90.6 
1  87.8 

(Av_nst5- 
Av_nst3) 
/  Av .  n-s  1 3 
confidence : 

79.1% 
correlation; 
0.2267 


LOC5-LOC3 
=  0  (8) 

LOG 5 -LOG 3 
=  1  (115) 

correlation: 
0.2120 


*  pos  is  a  positive  relation,  and  neg  is  a  negative  relation. 
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(continued) 

LOC5-LOC3  vs.  WE5-WE3  vs.  Av  nst5-Av  nst3 


Dependent 

Variables  j 

Independent  Variables 

(123  modules  of  each)  | 
#:  Distribution  Table  | 

WES -WE 3  1 

WE/LOGS 
WE/L0G3 

1  Av  nstS 
1  Av  nst3 

LOC5-LOC3  1 

Relationships  | 

pos  1 

neg 

1  pos 

confidence:  I 
99.99%  I- 

correlation:  | 
0.9892  1 

Signi f icance ( % ) | 

99.99  1 

99.99 

1  99.9 

F:   1682.35  I 
#  4.95  1 

t:  65.821 
#  3.841 

t:  6.07 
#  3.84 

1   t:  3.39 
1      #  3.16 

Dependent 

LOGS  1 
LOG  3  1 

WE/LOGS 
WE/L0G3 

1  Av  nstS 
1  Av  nst3 

WES -WE 3  1 

Relationships  | 

pos  1 

pos 

1  neg 

confidence:  | 
99.99%  I- 

correlation:  | 
0.9973  1 

Significance (% ) | 

99.99  1 

99.99 

1  99.7 

F:   1748.12  i 
#  4.95  1 

t:  65.821 
#  3.841 

t:  6.65 
#  3.84 

1   t:  3.03 
1  #2.76 

Dependent 

LOGS  1 
L0G3  1 

WES-WE3 

1  WE/LOGS 
1  WE/L0G3 

Avnst5-Avnst3 I 

Relationships  | 

pos  1 

neg 

1  pos 

confidence:  | 
99.99%  1- 

correlation:  | 
0.515  3  1 

Signif icance(%)  I 

99.9  1 

99.  7 

1  99.99 

F:   13.23  1 
#  4.95  1 

t:   3.39  1 
#   3.16  1 

t:  3.03 
#  2.76 

1   t:  4.89 
1      #  3.84 
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Chapter  4.  THE  RESULTS 


The     technique  of  the  regression  analysis  was  used  to  check 

on      relationships     between     variables     and     also     to     assist  in 

determining     the     best  set  of  predictor  variables.     In     order  to 

evaluate     some     observed     values     among     several     variables,  an 

hypothesis  test  was  performed  first.     The  alternative  hypothesis 

is     two    sided.     We  tested  all  the  predictors  in  order  to  detect 

those     inversely     related     to  Y  axis  as  well     as     those  directly 

related.     The  null  and  alternative  hypotheses  could  alternatively 

be  written  in  vector  notation  as 

Y  =  A  +  BlXl  +  B2X2  +   . . .   +  BkXk  +  e 

HO  :  B  =  0     (no  relationship)    :  reject 

HI   :  B  >  0     (direct  relationship)    :  accept 

HI   :  B  <  0     (inverse  relationship)    :  accept 

H  :  hypothesis 

Y  :  linear  function  of  k  predictor  variables,  XI,  X2 ,  ...  Xk 
A  :  significance  level 

B  :   significance  from  a  regression  equation 
e  :  error  term 

The  test  showed  which  was  acceptable  or  rejectable  in  a  given 
criteria. 

In  order  to  verify  the  validity  of  the  predictors.  The  F- 
distribution  and  the  student's  t-distr ibut ion  were  used  to  test 
whether  there  were  significant  differences  between  the  means  of 
samples  drawn  from  the  normally  distributed  variables.  Therefore, 
F-test  was  performed  for  a  dependent  variable  among  multiple 
variables  with  a  confidence  level  and  t-test  was  performed  for 
multiple  independent  variables  with  several  significant  levels 
for  the  several  variables. 
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If  the  F  value  or  t_values  are  acceptable,  the  relationships 
among  the  several  variables  exist.  So,  the  regression  analysis 
was  concerned  with  investigating  the  relationships  among  several 
variables  by  showing  which  variables  could  be  strong  predictors 
of  the  response  variable. 

The  experiment  was  to  analyze  two  versions  of  Unix  between 
dependent  variables  and  independent  variables.  The  variables  were 
chosen  among  the  terms  which  specified  in  the  software  measures 
research.  The  relative  relations  between  a  dependent  and 
independent  variables  were  obtained  by  the  multiple  regression 
test . 

The  percentage  of  the  confidence  level  and  significant  level 
was  used  to  determine  whether  the  variable  is  reliable  or  not. 
Usually,  the  levels  with  over  75  percent  will  be  considered 
reliable  by  most  logicians  and  mathematicians. 

The  negative  value  of  t-test  implies  a  negative  correlation 
between  variables,  and  the  positive  value  implies  a  positive 
correlation  between  variables.  So,  the  variables  were  evaluated 
by  the  correlations. 
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4.1  Lines  of  Code 


Dependent 

Variables  I 

Independent 

Variables 

(123  modules  of  each)  | 

LOG  3 

1 

1  WE3 

1 WE/L0C3 

Av.nst3 

LOC5-LOC3  1 

Relationships  | 

pos 

1  neg 

1  pos 

neg 

confidence:  I- 
90.9%  1 

correlation:  I 
0.2634  1 

Signif icance(%) I 

96.2 

1  88.6 

1  28.4 

57.4 

F:   2.050  ~  1 
F:   2.711  ~  1 
F:   3.811  ~  i 

96.2 
97.7 
97.6 

1  88.6 
1  91.6 
1  91.7 

1  28.4 

57.4 
52.9 

(LOC5-LOC3)  1 
/  L0C3  I- 

confidence:  I- 
36.4%  1 

correlation:  | 
0.1503  1 

Relationships  | 

pos 

1  neg 

1  pos 

neg 

Signif icance ( %) | 

75.  7 

1  76.8 

[  48.4 

76.9 

F:   0.636     ~  1 
F:   0.705     ~  1 
F:   0.513     ~  1 

75.  7 
66.2 
64.  7 

i  76.8 
1  68.1 
1  68.3 

1  48.4 

76.9 
70.1 

Table  2  Changes  in  Lines  of  Code 
The  partial  results  of  the  multiple  regression  test  for 
changes  in  lines  of  code  are  given  in  Table  2.  The  dependent 
variable  is  the  increase  in  the  number  of  lines  of  code  and  the 
independent  variables  consist  of  lines  of  code,  weight,  weight 
per  line,  and  average  nesting  level  of  the  system3.  We 
investigated  how  the  lines  of  code  in  System5  are  related  to  the 
lines  of  code,  weight,  weight  per  line,  and  average  nesting  level 
of  Systems. 

The  increase  in  lines  of  code  in  System5  was  highly 
correlated  with  the  lines  of  code(96.2%)  and  weight (88.6%)  of 
System3  with  significant  levels  that  were  much  higher  than  the 
standard     cutoff  point  of  75  percent.     The  positive  relationship 
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for  lines  of  code  implies  that  the  larger  modules  will  have 
larger  increases  in  lines  of  code[section  S.l.c].  But  the 
significant  levels  of  the  weight  per  line(28.4%)  and  average 
nesting  level(57.4%)  of  Systems  were  below  the  standard  cutoff 
point.  The  low  percentage  of  significant  level  implies  that  we 
can  not  predict  the  relationships  because  of  lack  of 
significance.  The  relation  between  the  updated  lines  and  weight 
is  surprising.  The  negative  relation  implies  that  the  modules 
with  higher  weighting  have  smaller  changes  in  lines  of 
code[section  5.2.b].  Therefore,  a  module  with  many  high  risk 
statements  will  tend  to  decrease  in  lines  of  code  during 
maintenance  or  enhancement [section  5.2]. 

4.2  Weights 


1  Dependent 

Variables  | 

Independent 

Variables  | 

1           (123  modules  of  each)  | 

L0C3 

1  WE3 

WE/L0C3 

Av . nst3 1 

i       WE5  -  WE3  1 

Relationships  I 

pos 

1  neg 

pos 

neg  | 

1   confidence:  |- 
1         86.9%  1 
1   correlation:  | 
1         0.2485  1 

Signif icance(%) | 

95.  7 

1  88.9 

23.6 

50. 7  1 

F:   1.809  ~  1 
F:   2.404  ~  | 
F:   3.417  ~  1 

95.  7 
97.7 
97.6 

1  88.9 
1  92.8 
1  92.9 

23.6 

50.7  1 
47.1  1 

1      (WE/L0C5)  1 
1     -(WE/L0C3)  I- 

1  confidence:  |- 
1         99.99%  1 
1   correlation:  | 
1         0.4259  1 

Relationships  | 

pos 

1  pos 

neg 

neg  | 

Signif icance(%) | 

6.6 

1  12.2 

99.9 

16.4  1 

F:   6.09  >   4.95  I 
F:12.23  >   7.32  I 
F:24.64  >   11.4  I 

6.6 

1  12.2 

99.9 
99.9 
99.9 

16.4  1 
15.8  1 

Table  3     Changes  in  Weights 
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The  dependent  variable  in  Table  3  is  the  difference  of 
weight  between  the  two  system.  The  investigation  was  how  weight 
changes  are  related  to  lines  of  code,  weight,  weight  per  line, 
and  average  nesting  level  of  Systems.  The  difference  of  weight 
between  two  system  was  correlated  with  the  lines  of  code (95. 7%) 
and  weight ( 88 . 9% )  of  System3  with  the  significant  levels  that 
were  higher  than  the  standard  cutoff  point.  But  we  did  not 
consider  the  relations  for  the  weight  per  line(23.6%)  and  average 
nesting  level(50.7%)  of  System3  because  of  their  low  confidence 
levels.  The  positive  relation  for  the  lines  of  code  implies  that 
larger  modules  tend  to  have  larger  increases  in  weights[section 
5. 2. a].  The  negative  relation  for  the  weight  implies  that  modules 
with  higher  weighting  tend  to  have  decreases  in  weights[section 
5. 4. a].  If  the  weight  of  a  module  is  relatively  high,  it  will 
tend  to  decrease  during  maintenance[section  5.4]. 
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4»  3  Average  Nesting  Level 


1  Dependent 

Variables  | 

Independent 

Variables  | 

1           (123  modules  of  each)  | 

LOG  3 

i     WE  3 

WE/L0C3 

Av.nst3 1 

1      (Av  nstS  -  1 
1       Av  nst3)  I- 

1   confidence:  |- 
1         99.8%  1 
1   correlation:  | 
1         0.3843  1 

Relationships  | 

pos 

1  neg 

1  pos 

neg  | 

Significance(%)  | 

93.4 

1   82.  2 

23.3 

99.9  1 

F:   4.77  >  4.50  I 
F:   6.38  >   5.78  I 
F:   8.37  >   7.32  j 

93.4 
95.9 
96.8 

1  82.2 
1  86.4 

1  23.3 

99.9  1 
99.9  1 
99.9  1 

1      (Av  nstS-  1 
1       Av  nst3)  I- 
1     /  Av.nstS  1 
1   confidence:  |- 
1         79.1%  1 
1   correlation:  | 
1         0.2267  1 

Relationships  | 

pos 

1  neg 

neg 

neg  | 

Significance (% ) | 

6.3 

1  15.9 

30.2 

94. 3  1 

F:   1.490  ~  1 
F:   2.798  ~  I 
F:   5.087  ~  | 

6.3 

1  15.9 

30.2 
52.5 

94. 3  1 
95.1  1 
97.6  1 

Table  4    Changes  in  Average  Nesting  Level 


The  difference  of  the  average  nesting  level  between  Systems 
and  System5  in  Table  4  was  compared  with  the  lines  of  code, 
weight,  weight  per  line,  and  average  nesting  level  of  Systems. 
The  significant  levels  of  the  lines  of  code(93.4%), 
weight(82.2%) ,  and  average  nesting  level(99.9%)  were  very  high. 
So,  the  difference  of  the  average  nesting  level  between  the  two 
system  was  correlated  with  the  lines  of  code,  weight,  weight  per 
line,  and  average  nesting  level  of  Systems.  These  independents 
will  be  strong  predictors.  But  the  weight  per  line(2S.3%)  of 
Systems  will  not  be  a  predictor  because  of  the  low  significant 
level.     The     relation  between  difference  of  average  nesting  level 
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and  lines  of  code  was  positive.  The  positive  relation  implies 
that  the  larger  modules  will  tend  to  have  larger  increases  in 
nesting  level[section  5. 3. a].  In  other  words,  if  the  size  of  code 
is  large,  the  nesting  levels  will  tend  to  increase  more  during 
maintenance  or  enhancement [section  5.3].  The  negative  relation 
for  the  weight  implies  that  modules  with  higher  weighting  will 
have  smaller  increases  or  even  decreases  in  average  nesting 
level[section  5. 5. a].  The  other  negative  relation  for  the 
average  nesting  level  implies  that  modules  with  higher  nesting 
levels  tend  to  have  decreases  or  smaller  increases  in  nesting 
level[section  5.6.c].  In  other  words,  if  the  nesting  levels  are 
high,  they  will  tend  to  be  reduced  or  only  slightly  increased 
during  maintenanceCsection  5.6]. 

4 . 4  Module  Changes 


1  Measures 

1   Changed  Modules 

1             Not  1 
1   Changed  Modules  | 

Total  i 

1  Average 
1       lines  of 
1  code 

1  279.365 

1           417.125  1 

288.325  1 

1  Average 
1  nesting 
1  levels 

1  68.643 

1           51.943  1 

67.556  1 

Table  5     CHANGED  versus  NOT  CHANGED  MODULES 

The  changed  and  not  changed  modules  in  Table  5  were  measured 
by  quantitative  analysis.  We  investigated  which  modules  will  not 
be  changed  and  which  ones  will  be  changed.  One  surprise  was  that 
most     of     the  unchanged  modules  were  big.     The  average     lines  of 
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code(417.125)  of  the  unchanged  modules  was  greater  than  the 
average  lines  of  code ( 279 . 365 )  of  the  changed  modules.  These 
numbers  implies  that  the  average  size  of  a  changed  module  is 
usually  smaller  than  the  average  size  of  ones  not  changed[section 
5.1. a].  In  other  words,  if  a  module  size  is  relatively  big,  it 
will     tend    not  to  be  changed  during  maintenance[section  5.1.b]. 

The  average  nesting  levels ( 68 . 643 )  of  changed  modules  were 
greater  than  the  average  nesting  levels ( 51 . 943 )  of  not  changed 
modules.  The  low  average  nesting  level  of  the  not  changed  modules 
implies  that  modules  with  lower  nesting  levels  will  tend  not  to 
be  changed[ section  5.6.b].  The  high  average  number  of  the  changed 
modules  implies  that  the  highly  nested  modules  will  likely  to  be 
changed  during  maintenance  or  enhancement [section  5. 6. a]. 
Therefore,  these  results  seem  to  suggest  that  size  and  nesting 
may  be  good  predictors  for  maintenance[section  5.1,  5.6]. 
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4.5  Predictors 

The  predictor  table  consisted  of  the  predictors  and 
predicted  factors  which  had  strong  relationships  with  the  lines 
of  code,  weights,  and  nesting  levels  between  System3  and 
Systein5[Table  6].  The  relationships  were  obtained  through  the  all 
the  analysis  process.  We  could  predict  program  changes  in  modules 
by  using  this  table  during  maintenance  or  enhancements. 

Table  6     Predictor  Table 


predicted  | 
factors  1 

predictors  | 

Lines  of 
code 

1  Weights 

1  Nesting 
levels 

1  Modules 
1     will  be 
1   changed  or 
1  not 

Lines 

of 
code 

1   larger  | 

increase 
(5.1.C) 

increase 
1    (5. 2. a) 

increase 
1    (5. 3. a) 

not  change 
1  (5.1.b) 

1  smaller  I 

change 
1      (5.1. a) 

Weights 

1  higher  | 

decrease 
(5.2.b) 

decrease 
(5. 4. a) 

decrease 
(5. 5. a) 

1   lower  1 

increase 
(5.4.b) 

Nesting 

1  higher  | 

decrease 
(5.3.b) 

decrease 
(5.5.b) 

decrease 
(5.6.C) 

change 
(5. 6. a) 

levels 

1   lower  1 

not  change 
(5.6.b) 

*  (5.-.-)  represent  concluding  number. 
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Chapter  5.  CONCLUSIONS 


The  following  relationships  were  assessed  from  our  empirical 
data  for  the  predictor  variables  such  as  module  size,  lines  of 
code,  weight,  and  average  nesting  level.  These  variables  can  be 
used  to  better  understanding  software  maintenance.  The 
predictors  could  be  evaluated  as  followings: 

5.1  LOC  vs.  LOC 

a)  The  average  size  of  a  changed  module  is  usually  smaller  than 
the  average  size  of  ones  not  changed . [section  4.4] 

b)  If  a  module  size  is  relatively  big,  it  will  tend  not  to  be 
changed . [section  4.4] 

c)  Larger  modules  tend  to  have  larger  increases  in  lines  of 
code.  The  percentage  of  updated  lines  will  be  increased 
too.  If  the  modules  are  changed,  more  code  will  be  added, 
[section  4.1] 

Changes  in  size  of  modules  during  maintenance  will  be 
related  to  the  original  size  of  the  modules.  We  predict  that  the 
smaller  modules  are  more  likely  to  be  changed.  When  a  larger 
module  is  changed,   it  will  increase  more  than  the  smaller  one. 

5.2  LOC  vs.  WE 

a)  Larger  modules  tend  to  have  larger  increases  in  weights, 
[section  4.2] 

b)  Modules  with  higher  weighting  tend  to  have  smaller  increases 
in  lines  of  code  or  decreases  of  the  lines  of  code.  [section 
4.1] 

More  high  risk  statements  will  be  added  to  larger  modules 
when  they  are  updated.  A  module  which  has  more  high  risk 
statements[in  higher  weight]  will  tend  to  be  modified  less. 
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5.3  Ave_nest  vs.  LOG 

a)  Larger    modules     tend    to  have  larger     increases     in  nesting 
level . [section  4.3] 

b)  Modules     with    higher  nesting  levels     tend     to    have  smaller 
increases  or  even  decreases  in  lines  of  code. [section  4.1] 

Average    nesting  levels  will  increase  in  larger  modules  when 

they  are  maintained.  A  module  which  has  higher  nesting  level  will 

tend  to  be  modified  less. 

5.4  WE  vs.  WE 

a)  Modules  with  higher  weighting     tend  to  have  smaller  increases 
or  even  decreases  in  weight . [section  4.2] 

b)  Modules  with  lower  weighting  tend  to  have  larger  increases  in 
weight . [section  4.2] 

Modules    which    have  more  high  risk  statements  will  tend  to 

decrease  those  statements  when  they  are  maintained. 

5.5  WE  vs.  Ave.nst 

a)  Modules    with    higher    weighting  tend  to    have    decreases  in 
average  nesting  level . [sectin  4.3] 

b)  Modules     with    higher  nesting  levels     tend     to    have  smaller 
increases  or  even  decreases  in  weight . [section  4.2] 

Modules     which  have  more  high  risk  statements  will     tend  to 

decrease  in  average  nesting  level  when  they  are  maintained. 

5.6  Ave_nst  vs.  Ave  nst 

a)  The  highly  nested  modules     will     likely  to  be  changed  during 
maintenance . [section  4.4] 

b)  If     nesting     levels  are  low,     a  module  will  tend  not     to  be 
changed. [section  4.4] 

c)  Modules     with    higher  nesting  levels     tend     to    have  smaller 
increases  or  even  decreases  in  nesting  level . [section  4.3] 

Modules  which  have  higher  average  nesting  level  will  tend  to 

decrease       in      average     nesting     level     during     maintenance  or 

enhancement . 
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Currently,  the  process  of  maintaining  software  is  not  well 
understood.  The  maintenance  task  will  be  helped  by  knowing  which 
modules  are  susceptible  and  which  ones  should  be  rewritten.  The 
significant  relationships  between  the  source  code  and  possible 
changes  will  be  used  to  suggest  improvements  in  both  the  program 
development  process  and  the  program  maintenance  process. 

What  possibilities  to  improve  maintenance? 

Our  predicted  maintenance  approach  will  enable  better 
planning  and  management  of  maintenance  work.  Making  program 
modules  more  easily  maintainable  could  reduce  the  maintenance 
tasks.  Our  approach  would  suggest  following  possibilities: 

a)  Identifying  some  types  of  maintenance  work. 

b)  Identifying  modules  to  be  rewritten. (which  ones  to  modify) 

The  modules  that  are  the  most  change  prone  can  be 
rewritten  to  improve  the  future  maintainability  of  the 
program. 

c)  Identifying  normal  maintenance  vs.  abnormal  maintenance. 
What  advice  to  developers? 


To  solve  the  maintenance  problems  the  tasks  of  developing 
software  must  be  simplified  and  automated.  The  developers  might 
be  able  to  reduce  the  maintenance  cost  by  trying  to  develop 
stable  modules.  Our  advice  is  as  following: 

a)  Stabilize  the  size  of  code. 

-  Larger  modules  will  be  increased  more  than  small  ones. 

-  More  high  risk  statements  will  be  added  to  larger  modules. 


Nesting  levels  will  increase  in  larger  modules. 

Relative  smaller  modules  will  be  more  stable  than  larger 
ones . 


Don't  discourage  code  with  higher  weighting. 

Modules  with  higher  weighting  will  tend  to  decrease  in 
lines  of  code. 

Modules  with  higher  weighting  tend  to  have  decreases  in 
weights . 

Module  with  higher  weighting  tend  to  have  decreases  in 
nesting  level. 

Modules  with  higher  weighting  will  tend  to  be  stable. 


Don't  discourage  highly  nested  code. 

Modules  with  higher  nesting  levels  tend  to  have  smaller 
increases  in  lines  of  code. 

Modules  with  higher  nesting  levels  tend  to  have  decreases 
in  weight. 

Modules  with  higher  nesting  levels  tend  to  have  smaller 
increases  in  nesting  level. 

However  highly  nested  modules  will  be  likely  to  change. 
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Chapter  7.  FUTURE  WORK 

The  results  of  our  analyses  suggest  five  different  areas  for 
future  research: 

1)  the  analysis  will  continue  to  try  to  find  more  relationships 
between  the  source  code  and  the  changes, 

2)  the  analysis  will  be  done  on  other  systems  in  other  languages 
to  try  to  generalize  the  conclusions, 

3)  the  relationships  will  be  used  to  develop  a  maintenance 
measure  to  predict  the  possible  changes  according  to  the 
module  size,  weight,   and  nesting  level, 

4)  further  comparing  the  changes  to  systems  during  development 
and  changes  to  systems  during  maintenance  will  be  conducted 
and, 

5)  patterns  of  changes  for  each  of  the  maintenance  activities 
defined  by  Swanson ( correct ive,  adaptive,  and  perfective 
maintenance)  will  be  developed  to  aid  in  analyzing  maintenance. 
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APPENDIX  A  :   SHELL  PROGRAMS 


A.  1     Counting  Program  for  the  Nesting  Levels 


■# 
# 

-# 


#  Indent  module  :  checks  the  nesting  level. 
#  


awk  -s  ' 

{count  =  0} 
r  I  {count  =  1} 

/  /  {count  =  2} 

/  /  {count  =  3} 

/  /  {count  =  4} 

/  /  {count  =5} 

/  /  {count 

{print  count  } '  $1 


=  6} 


A.  2  Statistics  for  the  Nesting  Levels 


#  

#  Average  nesting  module; 
#  


calculates  the  average  nesting  levels  # 
 # 


awk  -s  ' 
BEGIN  {  printf  "Levels 

printf  "  

zero  =  0 
one  =  0 
two  =  0 
three  =  0 
four  =  0 
five  =  0 
six  =  0 
sum  =  0 

} 

/O/   {zero  =  zero  +  1} 
/I/  {one  =  one  +1} 
/2/  {two  =  two  +  1} 
/3/  {three  =  three  +  1} 
/4/  {four  =  four  +  1} 
/5/  {five  =  five  +  1} 
/6/  {six  =  six  +  1} 
END  {  printf  "zero 
printf 


\n"  >>  "totalO" 
\n"  >>  "totalO" 


one 


printf  "two 
printf  "three 


%6d\n",  zero 

%6d\n",  one 

%6d\n",  two 

%6d\n",  three 

%6d\n",  four 

%6d\n",  five 

%6d\n",  six 


printf  "four 
printf  "five 
printf  "six 
printf  "  

zeroave  =  (zero  *  100)   /  NR 
printf  "Zeroave  =  %5.3f\n",  zeroave 


>>"totalO" 
>>"totalO" 
>>"totalO" 
>> "totalO" 
>>"totalO" 
>>"totalO" 
>>"totalO" 
 \n" 


>>  "totalO" 
>>  "totalO" 
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(continued) 


oneave  =   (one  *  100)   /  NR 
printf  "Oneave  =  %5.3f\n" 
twoave  =   (two  *  100)   /  NR 
printf  "Twoave  =  %5.3f\n" 
threeave  =   (three  *  100)   /  NR 
printf  "Threeave  =  %5.3f\n" 
fourave  =   (four  *  100)   /  NR 
printf  "Fourave  =  %5.3f\n" 
fiveave  =   (five  *  100)   /  NR 
printf  "Fiveave  %5.3f\n" 
sixave  =   (six  *  100)   /  NR 
printf  "Sixave  =  %5.3f\n",  sixave 

average     =  100  *   (zero  +  one*2  +  two*3 
average  +=  100  *  (four*5 
printf  "Total  average  = 
sum  =  zero  +  one  +  two  + 


oneave 


twoave 


threeave 


fourave 


fiveave 


>>  "totalO" 


>>  "totalO" 


>>  "totalO" 


>>  "totalO" 


>>  "totalO" 


>>  "totalO" 


printf 
printf 
printf 
printf 
printf 
printf 
printf 


'Sum  = 

'Lines  of  code  = 
'Sum/Lines  : 


+  five*6  +  six*7)  / 
%5.3f\n",  average 
three  +  four  +  five 
%10d\n",  sum 
%10d\n",  NR 
%10.3f\n",  (sum/NR) 
 \n" 

\n" 
\n" 
\n" 


+  three*4)  /NR 


NR 

>>  "totalO' 
+  six 
>>  "totalO' 
>>  "totalO" 
>>  "totalO' 
>>  "totalO' 
>>  "totalO' 
>>  "totalO' 
>>  "totalO' 


A»  3  Weights 

#  ^ 

#  Weight  module:  calculates  the  weight  of  each  source  program  # 
#  ^ 

BEGIN     {  CommentSw  =  0;  LineNumber  =  0  } 
{ 

# 

#  process  all  the  number  of  fields  in  the  current  record. 
# 

i  =  1 

while  (i   <=  NF) 
{ 

if   ((LineNumber  +  2)   <=  NR) 
{ 

count [ "blanklines " ]=count [ "blanklines" ]+NR-(LineNumber+l ) 

LineNumber  =  NR 

} 

# 

#  check  the  comment  switch  true. 
# 

if   (CommentSw  ==  1) 
{ 

if  ($i  ==  "*/") 
CommentSw  =  0 

} 
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(continued) 

#  if  the  comment  switch  is  false. 

else  { 

if  ($i  ==  "/*") 
{ 

CommentSw  =  1 
count [ "comments" ]++ 
} 

#     if  the  current  field  is  not  a  comment  field, 
else  { 

if   ((($1  ~  /\:/)    II    ($2  ~  /\:/))   &&   ($i   ==  $1)) 
{ 

if   ($1  ~  /default/)  count [ "default " ]++ 

else  if   (§1   1=  "case")         countC "labels" ]++ 

} 

if   ($i  ~  /\(/) 
{ 

#  split  the  source  line  delimited  by  "(". 
NoOfElement  =  split ($i.  Array,  "(") 

#  check  functions  inside  the  'if,  while,  for...' 
count["functions"]  =  count["functions"]  +  NoOfElement  - 


for   (k=l;  k  <=  NoOfElement;  k++) 
{ 

if  (ArrayCk]  ==  "if") 

( count [ " i  f " ]++  count [ " f unct  i  ons " ]  — 

else  if   (ArrayCk]  ==  "for") 

{ count [ " for " ]++        count [ " f unct  ions " ]  — 
else  if  (ArrayCk]  ==  "while") 

{ count  C " whi le " ]++    count  C " f unct  ions " ]  — 
else  if   (ArrayCk]  ==  "switch") 

{ count C "switch" ]++  count C " functions" ] — 
else  if   (ArrayCk]  ==  "rerturn") 

{ count C "return" ]++  count C " functions " ] — 
else  if   ((ArrayCk]  ==  "getchar")    ||    (ArrayCk]  = 

{ count  C "input " ]++    count  C " f unct  ions " ]  — 
else  if   (ArrayCk]  ==  "scanf") 

{ count C" input "]++    countC "functions"] — 
else  if   ((ArrayCk]  ==  "putchar")    ||    (ArrayCk]  =■■ 

{ count  C " output " ]++  count  C " functions " ]  — 
else  if   (ArrayCk]  ==  "printf") 

{ count  C "output " ]++  count  C " functions " ]  — 
else  if   (ArrayCk]  ==  "printw") 
^  {countC "output" ]++  countC"f unct ions"] — 

}       #     end   'if   (§i  ~  /\(/) ' 

if   ($i  ~  /\=/) 
{ 


■getc")) 


'putc") ) 
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(continued) 


if  (($i  i~  /\i=/)&&($i  i~  /\==/)&&($i  i~  /\<=/)&&($i  r  /\> 
{ 

count[ "assignments" ]++ 

} 

}       #  end   'if   ($i   *   /\=/) ' 

#    declarations   


if 

($i 

==  "int") 

count [ 

"declarations " 

]++ 

else 

if 

($i 

==  "float") 

count [ 

"declarations " 

]++ 

else 

if 

($i 

==  "double") 

count [ 

"declarations " 

]++ 

else 

if 

($i 

==  "struct") 

count [ 

"declarations" 

]++ 

■J  *  * 

else 

if 

($i 

==  "auto") 

count [ 

" declarations " 

else 

if 

($i 

==  "extern") 

count [ 

"declarations " 

]++ 

else 

if 

(?i 

==  "register" 

)  count [ 

"declarati  ons " 

]++ 

else 

if 

($i 

==  "static") 

count [ 

"declarations " 

]++ 

else 

if 

($i 

==  "if") 

{ count [ 

"if "]++ 

count [ 

" functions" 

} 

else 

if 

($i 

==  "for") 

{ count [ 

"for"]++ 

count [ 

" f unc t  i  ons " J  — — 

} 

else 

if 

($i 

==  "while") 

{ count [ 

"while" ]++ 

count [ 

" functions " ] — 

} 

else 

if 

($i 

==  "switch") 

{ count [ 

" swi  t ch" ]++ 

count [ 

" f unc t  ions " 3  — 

} 

else 

if 

(($i 

==  "return") 

II  ($i 

==  "return : " ) ) 

{ 

count [ 

"return" ]++ 

if 

(($i  == 

"return")  && 

{ $ ( i+1 ) 

~  /\ (/) ) 

/  Will 

count  r 

" f unc t  i  ons " T  — 

else 

if 

(($i 

==  "getchar" 

)    II  ($i 

==  "qetc")) 

{ count  r 

"  inout  *'  1++ 

count [ 

" f unct  i  ons "J  — 

} 

else 

if 

($i 

==  "scanf") 

{ count  [_ 

"input " ]++ 

count  r 

" f unct  i  ons " 1 —  — 

} 

else 

if 

(($i 

==  "put char" 

)    II  ($i 

==  "putc")) 

{ count [ 

"output" ]++ 

countC 

"functions"] — 

] 

else 

if 

($i 

==  "printf") 

{ count [ 

"output "]++ 

count [ 

"functions" ] — 

} 

else 

if 

($i 

==  "printw") 

{ count [ 

"output "]++ 

count [ 

" functions" ] — 

} 

else 

if 

($i 

==  "else") 

count [ 

"else"]++ 

else 

if 

($i 

~  /\#/) 

count [ 

"preprocessor " 

]++ 

else 

if 

($i 

==  "case") 

count [ 

"case" ]++ 

else 

if 

($i 

==  "goto") 

count [ 

"goto"]++ 

else 

if 

(($i 

==  "break") 

II   ($i  = 

=  "break;")) 

count [ 

"break" ]++ 

else 

if 

{($i 

==  "continue 

")    II  ($ 

i  ==  "continue;")) 

count [ 

"continue" ]++ 

} 

LineNumber  =  NR 
++i 

} 

} 
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(continued) 


END  { 
printf 
printf 
printf 
printf 
printf 
printf 
printf 
printf 
printf 
printf 
printf 
printf 
printf 
printf 
printf 
printf 
printf 
printf 
printf 
printf 
printf 
printf 


'File  Name  : 

'for 

'while 

'if 

'else 

'switch 

'case 

'goto 

'break 

'continue 

'assignments 

'preprocessor 

'comments 

'blanklines 

'return 

'input 

'output 

'  functions 

'declarations 

'default 


FILENAME 


%10d\n 
%10d\n 
%10d\n 
%10d\n 
%10d\n 
%10d\n 
%10d\n 
%10d\n 
%10d\n 
%10d\n 
%10d\n 
%10d\n 
%10d\n 
%10d\n 
%10d\n 
%10d\n 
%10d\n 
%10d\n 
%10d\n 


>>  "totalO' 

"\n"  >>  "totalO' 

=========\n"  >>  "totalO' 

count["for"]  >>  "totalO' 

count["while"]  >>  "totalO' 

count["if"]  >>  "totalO' 

count["else"]  >>  "totalO' 

count["switch"]  >>  "totalO' 

count["case"]  >>  "totalO' 

count ["goto"]  >>  "totalO' 

countC"break"]  >>  "totalO' 

count[ "continue" ]  >>  "totalO' 

count[ "assignments" ]  >>  "totalO' 

count [ "preprocessor" ] >  >  "totalO' 

count [ "comments" ]  >>  "totalO' 

countC "blanklines" ]  >>  "totalO' 

count[ "return" ]  >>  "totalO' 

count [ "input"]  >>  "totalO' 

count["output"]  >>  "totalO' 

count[ "functions" ]  >>  "totalO' 

count [ "declarations" ]>>  "totalO' 

count[ "default"]  >>  "totalO' 


#  calculate  the  weights 


weights 
weights 
weights 
weights 
weights 


+= 
+= 
+= 
+= 


18.4 
7.9 
6.8 
4.6 
2.4 


count[ "declarations" ]  + 

count ["for"]  + 

count["switch" ]  + 

count["preprocessor"]  + 
count [ "comments" ] 


11.4  *  count["if"] 

8.5  *  count[ "while"] 

5.6  *  count[ "case" ] 
11.1  *  count["goto" ] 


printf  "  \n"  >>  "totalO" 

printf  "Weights  =                %10.5f\n",  weights  >>  "totalO" 

printf  "Lines  of  code  =     %10d\n",  NR  >>  "totalO"; 

printf  "Weights/Lines  =     %10.5f\n",    (weights/NR)   >>  "totalO"; 
printf  "  \n"  >>  "totalO"; 


} 
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APPENDIX  C.     STATEMENT  TYPES 


Statement  Types 
(123  modules) 
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ABSTRACT 


Software  maintenance  of  computer  systems  has  been  an 
important  task.  Current  demands  require  the  development  of  good 
tools  for  evaluating  software  during  maintenance  and  enhancement. 
The  maintenance  process  is  not  well  understood  so  far.  The  first 
step  of  my  research  analyzed  the  relation  of  changes  between  the 
Unix  Systems  and  Systems  of  C  modules.  The  analysis  will  help 
evaluating  and  identifying  changes  within  modules.  One  goal  of 
this  research  is  the  development  of  a  measure  to  predict  where 
software  changes  are  likely  to  occur. 

The  result  section  of  the  paper  describes  the  relationships 
among  several  predictors  such  as  lines  of  code,  weight,  and 
nesting  levels.  The  concluding  section  represents  the  evaluation 
of  the  predictors. 


